-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF faster head, tail and size groupby methods #5533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jreback I fixed this up. A slight api change is that head now respects original frame order (it doesn't if you do an apply). |
can you add some tests where the groups vary in size and where the number you are asking for (e.g. head(3)) is > than the number in some/all groups. |
Added tests for <=0 and > max group size. One already for between group sizes. (though it's a small example, I think it covers...) |
''' | ||
Returns first n rows of each group. | ||
|
||
Essentially equivalent to .apply(lambda x: x.head(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double backquote the code (.apply(..)
) so it is rendered as code?
Added some comments to the docstrings. I have another question regarding the docstring style:
|
@jorisvandenbossche Thanks for comments, will update. |
Fixed these. Mentioned the ascending arg to cumcount, which I've purposely made kwarg only for now. Noticed a weird related thing with nth on a DataFrame (it just doesn't work, and is kinda undefined), will make sep issue though. #5552 |
@@ -474,6 +473,10 @@ def ohlc(self): | |||
return self._cython_agg_general('ohlc') | |||
|
|||
def nth(self, n): | |||
""" | |||
Return the nth row of each group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I noticed makes this a complete lie. Ooops, will delete then merge.
PERF faster head, tail and size groupby methods
Try again with #5518.
Massive gains in groupby head and tail, adds more tests for these, slight speed improvement in size (not as much as I'd hoped, basically iterating through grouper.indices is slow :( ).
As mentioned before, I added a helper function to prepend as_index to index (if that makes any sense), I think it could be faster, and also using it in apply may fix some bugs there...